12 research outputs found
Energy Sharing for Multiple Sensor Nodes with Finite Buffers
We consider the problem of finding optimal energy sharing policies that
maximize the network performance of a system comprising of multiple sensor
nodes and a single energy harvesting (EH) source. Sensor nodes periodically
sense the random field and generate data, which is stored in the corresponding
data queues. The EH source harnesses energy from ambient energy sources and the
generated energy is stored in an energy buffer. Sensor nodes receive energy for
data transmission from the EH source. The EH source has to efficiently share
the stored energy among the nodes in order to minimize the long-run average
delay in data transmission. We formulate the problem of energy sharing between
the nodes in the framework of average cost infinite-horizon Markov decision
processes (MDPs). We develop efficient energy sharing algorithms, namely
Q-learning algorithm with exploration mechanisms based on the -greedy
method as well as upper confidence bound (UCB). We extend these algorithms by
incorporating state and action space aggregation to tackle state-action space
explosion in the MDP. We also develop a cross entropy based method that
incorporates policy parameterization in order to find near optimal energy
sharing policies. Through simulations, we show that our algorithms yield energy
sharing policies that outperform the heuristic greedy method.Comment: 38 pages, 10 figure
A Multi-phase Approach for Improving Information Diffusion in Social Networks
For maximizing influence spread in a social network, given a certain budget
on the number of seed nodes, we investigate the effects of selecting and
activating the seed nodes in multiple phases. In particular, we formulate an
appropriate objective function for two-phase influence maximization under the
independent cascade model, investigate its properties, and propose algorithms
for determining the seed nodes in the two phases. We also study the problem of
determining an optimal budget-split and delay between the two phases.Comment: To appear in Proceedings of The 14th International Conference on
Autonomous Agents & Multiagent Systems (AAMAS), 201
Energy Management in a Cooperative Energy Harvesting Wireless Sensor Network
In this paper, we consider the problem of finding an optimal energy
management policy for a network of sensor nodes capable of harvesting their own
energy and sharing it with other nodes in the network. We formulate this
problem in the discounted cost Markov decision process framework and obtain
good energy-sharing policies using the Deep Deterministic Policy Gradient
(DDPG) algorithm. Earlier works have attempted to obtain the optimal energy
allocation policy for a single sensor and for multiple sensors arranged on a
mote with a single centralized energy buffer. Our algorithms, on the other
hand, provide optimal policies for a distributed network of sensors
individually harvesting energy and capable of sharing energy amongst
themselves. Through simulations, we illustrate that the policies obtained by
our DDPG algorithm using this enhanced network model outperform algorithms that
do not share energy or use a centralized energy buffer in the distributed
multi-nodal case.Comment: 11 pages, 4 figure
Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm
Learning optimal behavior from existing data is one of the most important
problems in Reinforcement Learning (RL). This is known as "off-policy control"
in RL where an agent's objective is to compute an optimal policy based on the
data obtained from the given policy (known as the behavior policy). As the
optimal policy can be very different from the behavior policy, learning optimal
behavior is very hard in the "off-policy" setting compared to the "on-policy"
setting where new data from the policy updates will be utilized in learning.
This work proposes an off-policy natural actor-critic algorithm that utilizes
state-action distribution correction for handling the off-policy behavior and
the natural policy gradient for sample efficiency. The existing natural
gradient-based actor-critic algorithms with convergence guarantees require
fixed features for approximating both policy and value functions. This often
leads to sub-optimal learning in many RL applications. On the other hand, our
proposed algorithm utilizes compatible features that enable one to use
arbitrary neural networks to approximate the policy and the value function and
guarantee convergence to a locally optimal policy. We illustrate the benefit of
the proposed off-policy natural gradient algorithm by comparing it with the
vanilla gradient actor-critic algorithm on benchmark RL tasks.Comment: This paper has been accepted for presentation at the IJCNN at IEEE
WCCI 2022 and for publication in the conference proceedings published by IEE